Who are we wrt R?
Wherever you are, you’re not alone! As we begin learning R (or learning new things in R), remember…
2024-01-22
Who are we wrt R?
Wherever you are, you’re not alone! As we begin learning R (or learning new things in R), remember…
R is the computational engine; RStudio is the interface
Projects allow RStudio to leave notes for itself (e.g., history), will always start a new R session when opened, and will always set the working directory to the Project directory.
Create a system for organizing the objects in this project!
Functions are the “verbs” that allow us to manipulate data. Packages contain functions, and all functions belong to packages.
R comes with about 30 packages (“base R”). There are over 10,000 user-contributed packages; you can discover these packages online in Comprehensive R Archive Network (CRAN), with more in active development on GitHub.
To use a package, install it once
tidyverse (or a different package name) then click on Install.install.packages("tidyverse")In each new R session, you’ll have to load the package if you want access to its functions: e.g., type library(tidyverse).
# demarcates code comments<- is the assignment operator, how we name new objects in the R environmentR has two native data formats:
readRDS("path/filename.RDS"), saveRDS(object, file = "path/filename.RDS")load("path/filename.Rdata"), save(object1, object2, file = "path/filename.RData"), save.image("path/filename.RData")You can import any data format if you know the right command/(package):
read.csv (base R), read_csv (tidyverse)read_excel (readxl)read.dta (foreign), read_dta (haven)Primary data types include numeric, integer, logical, and character; plus factors.
Download R materials from today’s canvas page!
Artwork by @allison_horst
Examining data:
names()head() and tail()str(); glimpse() (dplyr equivalent)summary()These (base R) commands will operate an the full object (all variables/columns in a data frame). To access a specific variable/column, use the $ operator: df$varname.
Part of the the tidyverse, dplyr is a package for data manipulation. The package implements a grammar for transforming data, based on verbs/functions that define a set of common tasks.
dplyr functions are for data frames.
dplyr functions is always a data frameselect(.data, var1, var2, var3)
select() helpers include
select(.data, **var1:var10**): select range of columnsselect(.data, **-c(var1, var2)**): select every column butselect(.data, **starts_with("string")**): select columns that start with… (or ends_with(“string”))select(.data, **contains("string")**): select columns whose names contain…filter(.data, var == value)
| Logical tests | Boolean operators for multiple conditions |
|---|---|
| x < y: less than | a & b: and |
| y >= y: greater than or equal to | a | b: or |
| x == y: equal to | xor(a,b): exactly or |
| x != y: not equal to | !a: not |
| x %in% y: is a member of | |
| is.na(x): is NA |
arrange(.data, var) arrange(.data, desc(var))
count(.data, var)
The pipe (%>%) allows you to chain together functions by passing (piping) the result on the left into the first argument of the function on the right. It allows us to call a series of functions in sequence (read the pipe as “and then…”).
dataframe %>% filter(var1 > 0) %>% select(var1, var2, var3)
%>%